NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

New characteristics of MiRNA and IsomiR interactions with mRNA

https://doi.org/10.1038/s41598-025-21561-x

Weston, Matthew; Ripan, Rony_Chowdhury; Li, Xiaoman; Hu, Haiyan (October 2025, Scientific Reports)
Recurrent enhancer-promoter interactions across samples

https://doi.org/10.1101/2025.09.27.678855

Weston, Mathew; Gunjala, Satvik; Hu, Haiyan; Li, Xiaoman (September 2025, bioRxiv)

Abstract Enhancer-promoter interactions (EPIs) are fundamental to gene regulation, and understanding their recurrence across diverse biological samples is key to deciphering chromatin architecture. In this study, we systematically analyzed the recurrence of EPIs across 49 Hi-C and 95 HiChIP datasets. We found that the majority of EPIs identified in a given sample were also present in other samples, regardless of the assay type (Hi-C or HiChIP) or the enhancer annotations used. Interestingly, EPIs that appeared unique to individual samples were typically surrounded by fewer neighboring EPIs, suggesting they may not represent truly sample-specific interactions. Our findings indicate that most human EPIs have already been captured and that cells primarily reuse subsets of these shared EPIs across different cell types and conditions. This study provides new insights into the pervasive and reusable nature of EPIs in the human genome, with important implications for chromatin conformation studies.
more » « less
Free, publicly-accessible full text available September 29, 2026
Deep learning inference of miRNA expression from bulk and single-cell mRNA expression

https://doi.org/10.1142/S021972002550009X

Ripan, Rony Chowdhury; Athaya, Tasbiraha; Li, Xiaoman; Hu, Haiyan (June 2025, Journal of Bioinformatics and Computational Biology)

Studying miRNA activity at the single-cell level presents a significant challenge due to the limitations of existing single-cell technologies in capturing miRNAs. To address this, we introduce two deep learning models: Cross-modality (CM) and single-modality (SM), both based on encoder-decoder architectures. These models predict miRNA expression at both bulk and single-cell levels using mRNA data. We evaluated the performance of CM and SM against the state-of-the-art miRSCAPE approach, using both bulk and single-cell datasets. Our results demonstrate that both CM and SM outperform miRSCAPE in accuracy. Furthermore, incorporating miRNA target information substantially enhanced performance compared to models that utilized all genes. These models provide powerful tools for predicting miRNA expression from single-cell mRNA data.
more » « less
Free, publicly-accessible full text available June 1, 2026
A deep learning method to integrate extracelluar miRNA with mRNA for cancer studies

https://doi.org/10.1093/bioinformatics/btae653

Athaya, Tasbiraha; Li, Xiaoman; Hu, Haiyan; Mathelier, ed., Anthony (November 2024, Bioinformatics)

Abstract MotivationExtracellular miRNAs (exmiRs) and intracellular mRNAs both can serve as promising biomarkers and therapeutic targets for various diseases. However, exmiR expression data is often noisy, and obtaining intracellular mRNA expression data usually involves intrusive procedures. To gain valuable insights into disease mechanisms, it is thus essential to improve the quality of exmiR expression data and develop noninvasive methods for assessing intracellular mRNA expression. ResultsWe developed CrossPred, a deep-learning multi-encoder model for the cross-prediction of exmiRs and mRNAs. Utilizing contrastive learning, we created a shared embedding space to integrate exmiRs and mRNAs. This shared embedding was then used to predict intracellular mRNA expression from noisy exmiR data and to predict exmiR expression from intracellular mRNA data. We evaluated CrossPred on three types of cancers and assessed its effectiveness in predicting the expression levels of exmiRs and mRNAs. CrossPred outperformed the baseline encoder-decoder model, exmiR or mRNA-based models, and variational autoencoder models. Moreover, the integration of exmiR and mRNA data uncovered important exmiRs and mRNAs associated with cancer. Our study offers new insights into the bidirectional relationship between mRNAs and exmiRs. Availability and implementationThe datasets and tool are available at https://doi.org/10.5281/zenodo.13891508.
more » « less
PSPI: A deep learning approach for prokaryotic small protein identification

https://doi.org/10.3389/fgene.2024.1439423

Weston, Matthew; Hu, Haiyan; Li, Xiaoman (July 2024, Frontiers in Genetics)

Small Proteins (SPs) are pivotal in various cellular functions such as immunity, defense, and communication. Despite their significance, identifying them is still in its infancy. Existing computational tools are tailored to specific eukaryotic species, leaving only a few options for SP identification in prokaryotes. In addition, these existing tools still have suboptimal performance in SP identification. To fill this gap, we introduce PSPI, a deep learning-based approach designed specifically for predicting prokaryotic SPs. We showed that PSPI had a high accuracy in predicting generalized sets of prokaryotic SPs and sets specific to the human metagenome. Compared with three existing tools, PSPI was faster and showed greater precision, sensitivity, and specificity not only for prokaryotic SPs but also for eukaryotic ones. We also observed that the incorporation of (n,k)-mers greatly enhances the performance of PSPI, suggesting that many SPs may contain short linear motifs. The PSPI tool, which is freely available athttps://www.cs.ucf.edu/∼xiaoman/tools/PSPI/, will be useful for studying SPs as a tool for identifying prokaryotic SPs and it can be trained to identify other types of SPs as well.
more » « less
Full Text Available
A survey of experimental and computational identification of small proteins

https://doi.org/10.1093/bib/bbae345

Beals, Joshua; Hu, Haiyan; Li, Xiaoman (July 2024, Briefings in Bioinformatics)

Abstract Small proteins (SPs) are typically characterized as eukaryotic proteins shorter than 100 amino acids and prokaryotic proteins shorter than 50 amino acids. Historically, they were disregarded because of the arbitrary size thresholds to define proteins. However, recent research has revealed the existence of many SPs and their crucial roles. Despite this, the identification of SPs and the elucidation of their functions are still in their infancy. To pave the way for future SP studies, we briefly introduce the limitations and advancements in experimental techniques for SP identification. We then provide an overview of available computational tools for SP identification, their constraints, and their evaluation. Additionally, we highlight existing resources for SP research. This survey aims to initiate further exploration into SPs and encourage the development of more sophisticated computational tools for SP identification in prokaryotes and microbiomes.
more » « less
A computational modeling of pri-miRNA expression

https://doi.org/10.1371/journal.pone.0290768

Zheng, Hansi; Wang, Saidi; Li, Xiaoman; Hu, Haiyan (January 2024, PLOS ONE)
Helmer-Citterich, Manuela (Ed.)
MicroRNAs (miRNAs) play crucial roles in gene regulation. Most studies focus on mature miRNAs, which leaves many unknowns about primary miRNAs (pri-miRNAs). To fill the gap, we attempted to model the expression of pri-miRNAs in 1829 primary cell types, cell lines, and tissues in this study. We demonstrated that the expression of pri-miRNAs can be modeled well by the expression of specific sets of mRNAs, which we termed their associated mRNAs. These associated mRNAs differ from their corresponding target mRNAs and are enriched with specific functions. Most associated mRNAs of a miRNA are shared across conditions, while on average, about one-fifth of the associated mRNAs are condition-specific. Our study shed new light on understanding miRNA biogenesis and general gene transcriptional regulation.
more » « less
Full Text Available
Are the predicted known bacterial strains in a sample really present? A case study

https://doi.org/10.1371/journal.pone.0291964

Ventolero, Minerva; Wang, Saidi; Hu, Haiyan; Li, Xiaoman (October 2023, PLOS ONE)
El_Allali, Achraf (Ed.)
With mutations constantly accumulating in bacterial genomes, it is unclear whether the previously identified bacterial strains are really present in an extant sample. To address this question, we did a case study on the known strains of the bacterial speciesS.aureusandS.epidermisin 68 atopic dermatitis shotgun metagenomic samples. We evaluated the likelihood of the presence of all sixteen known strains predicted in the original study and by two popular tools in this study. We found that even with the same tool, only two known strains were predicted by the original study and this study. Moreover, none of the sixteen known strains was likely present in these 68 samples. Our study thus indicates the limitation of the known-strain-based studies, especially those on rapidly evolving bacterial species. It implies the unlikely presence of the previously identified known strains in a current environmental sample. It also called for de novo bacterial strain identification directly from shotgun metagenomic reads.
more » « less
Full Text Available
Multimodal deep learning approaches for single-cell multi-omics data integration

https://doi.org/10.1093/bib/bbad313

Athaya, Tasbiraha; Ripan, Rony Chowdhury; Li, Xiaoman; Hu, Haiyan (August 2023, Briefings in Bioinformatics)

Abstract Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms.
more » « less
SMS: A Novel Approach for Bacterial Strain Analysis in Multiple Samples

https://doi.org/10.26502/jbsb.5107065

Wang, Saidi; Fatimae_Ventolero, Minerva; Hu, Haiyan; Li, Xiaoman (January 2023, Journal of Bioinformatics and Systems Biology)

Full Text Available

« Prev Next »

Search for: All records